Three Way Segmentation(⋆)

نویسندگان

  • Valerio Aniello Tutore
  • Roberta Siciliano
  • Massimo Aria
چکیده

Introduction. This work is to be set within the study context of supervised classification tree-based model. This methodology, according to CART approach (Breiman et al. 1984) allows to build a tree-structure to describe the dependence links between an independent variable and a set of explicative variables in regression and classification matters. The advantages of binary segmentation are to be found in the opportunity to deal with heterogeneous data, to analyze huge datasets, in the simplicity to implement the algorithm to generate trees and finally in the easeness interpretation of the results through a tree structured graph. The segmentation algorithm consists on a recursive partitioning of a set of statistical units in always sharper and homogeneous groups. The goal is to define a final partition of the dataset formed by separated and completed sub-groups represented by terminal nodes which will be assigned a response value . The terminal nodes will bear an higher level of internal homogeneity, measured with reference to the distribution of the response variable. Two-Stage approaches. The Two Stage criteria (Mola e Sicliano, 1992) defines the tree structure through an approach organized in two main steps: first of all a sub set of original predictors that better explain the response variable, has to be identified. Then the best binary partition within those who were generated by the subset of predictors, has to be identified. Starting from that idea, in literature, has been proposed several two-stage approaches dealing with partitioning of non standard data structure, such as: FAST (Mola and Siciliano, 1997) to reduce computational cost of analysis of huge datasets; TS-DIS (Mola and Siciliano, 2000) which uses linear discriminant functions to define a multivariate splitting criteria; Multi-Class Budget Tree (Aria, 2005) based on a latent budget partitioning algorithm introduced to analyse fuzzy data. Classical tree-based methodologies are characterized by the fact that use one or just a small subset of original instances to define the final data partition. In certain context result to be very important to investigate the role that each single variable plays in explaining the response. For example, when in presence of complex data structures, characterized by groups of co-variates internally correlated with each-others and hierarchically connected to a synthesis framework, like in the case of data coming from customer satisfaction surveys, the need of a better interpretative value is particularly felt. According to this point of view, in the present work an alternative approach, based on a three stage methodology, is proposed, namely Three Way Segmentation (TWS). (⋆) The present paper is financially supported by MIUR Funds 2005 awarded to R. Siciliano (Prot. N. 2005130191). The proposed methodology. The idea consist, in the first stage, on the definition of a splitting criteria, using discriminant functions, allows to reduce the dimensionality of the analysis, shifting the attention towards a set of latent predictors synthesis of the original variables. In the second stage, the algorithm run to the creation of one global latent variable, synthesis of the previous discriminant functions obtained in the first step. In the third step the methodology identify the best split of latent variable respect to the response. The introduction of a third step, known as TS-DIS idea can be justified with a two-fold consideration: on one hand a unique global latent and discriminant variable that is the linear combination of other latent variables that express each single dimension, allows to generate binary splits to which every predictor contributes at the same time; on the other hand, taking into account the latent variables expression of the several dimensions, it is possible to calculate a series of coefficient that represent the weight of the link among those, the predictors, the response variable and the global latent variable. In other words, if the condition for the application are verified, the addition of a third stage allows a better interpretation for the explanation of the phenomenon, because all the variables act simultaneously at the same time the split is created, but nevertheless it is possible to give an interpretation the valence of each of those towards both response and dimensional latent variables. The proposed methodology has been implemented in matlab environment, as an additional module of Tree Harvest Software (Aria, 2004). Several analysis on simulated and real datasets show how this techniques can offer interesting results.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

The Role of Self-Regulatory Approach in Iranian Learners' Lexical Segmentation: The case of authentic materials

The present research investigated the effect of self-regulatory approach (with two components of self-checking and self-efficacy) on pre-intermediate Iranian learners' lexical segmentation in listening comprehension via authentic listening comprehension texts. To achieve this purpose, the investigators administered an Oxford Placement Test (2007) to ninety-eight students of two girls’ private j...

متن کامل

The Role of Self-Regulatory Approach in Iranian Learners' Lexical Segmentation: The case of authentic materials

The present research investigated the effect of self-regulatory approach (with two components of self-checking and self-efficacy) on pre-intermediate Iranian learners' lexical segmentation in listening comprehension via authentic listening comprehension texts. To achieve this purpose, the investigators administered an Oxford Placement Test (2007) to ninety-eight students of two girls’ private j...

متن کامل

Quantitative Comparison of SPM, FSL, and Brainsuite for Brain MR Image Segmentation

Background: Accurate brain tissue segmentation from magnetic resonance (MR) images is an important step in analysis of cerebral images. There are software packages which are used for brain segmentation. These packages usually contain a set of skull stripping, intensity non-uniformity (bias) correction and segmentation routines. Thus, assessment of the quality of the segmented gray matter (GM), ...

متن کامل

A Study to Improve the Response in Email Campaigning by Comparing Data Mining Segmentation Approaches in Aditi Technologies

Email marketing is increasingly recognized as an effective Internet marketing tool. In this study, a questionnaire is constructed and distributed to a sample of 146 prospects of Aditi Technologies to find the factors associated with higher response rates. The collected data is analyzed using Factor Analysis and the 11 factors, From Line, Subject Line, Personalization of the subject line, Timing...

متن کامل

Color Reduction in Hand-drawn Persian Carpet Cartoons before Discretization using image segmentation and finding edgy regions

In this paper, we present a method for color reduction of Persian carpet cartoons that increases both speed and accuracy of editing. Carpet cartoons are in two categories: machine-printed and hand-drawn. Hand-drawn cartoons are divided into two groups: before and after discretization. The purpose of this study is color reduction of hand-drawn cartoons before discretization. The proposed algorit...

متن کامل

A Three Stages Segmentation Model for a Higher Accurate off-line Arabic Handwriting Recognition

Arabic handwriting recognition considers a one of the hardest applications of OCR system. The reason of that relates to characteristics of Arabic characters and the way of writing cursively. Furthermore, no rules can control on handwriting way, different styles, sizes and curves make the process of recognition is very complex. On other side, the key for reaching to good recognition is by gettin...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2006